Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox

نویسندگان

Jialei Wang

Weiran Wang

Nathan Srebro

چکیده

We present and analyze statistically optimal, communication and memory efficient distributed stochastic optimization algorithms with near-linear speedups (up to log-factors). This improves over prior work which includes methods with near-linear speedups but polynomial communication requirements (accelerated minibatch SGD) and communication efficient methods which do not exhibit any runtime speedups over a naive single-machine approach. We first analyze a distributed SVRG variant as a distributed stochastic optimization method and show that it can achieve nearlinear speedups with logarithmic rounds of communication, at the cost of high memory requirements. We then present a novel method, stochastic DANE, which trades off memory for communication and still allows for optimization with communication which scales only logarithmically with the desired accuracy while also being memory efficient. Stochastic DANE is based on a minibatch prox procedure, solving a non-linearized subproblem on a minibatch at each iteration. We provide a novel analysis for this procedure which achieves the statistical optimal rate regardless of minibatch size and smoothness, and thus significantly improving on prior work.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrated Model, Batch and Domain Parallelism in Training Neural Networks

We propose a new integrated method of exploiting both model and data parallelism for the training of deep neural networks (DNNs) on large distributed-memory computers using minibatch stochastic gradient descent (SGD). Our goal is to find an efficient parallelization strategy for a fixed batch size using P processes. Our method is inspired by the communication-avoiding algorithms in numerical li...

متن کامل

Stochastic Optimization with Importance Sampling

Uniform sampling of training data has been commonly used in traditional stochastic optimization algorithms such as Proximal Stochastic Gradient Descent (prox-SGD) and Proximal Stochastic Dual Coordinate Ascent (prox-SDCA). Although uniform sampling can guarantee that the sampled stochastic quantity is an unbiased estimate of the corresponding true quantity, the resulting estimator may have a ra...

متن کامل

Stochastic Optimization with Importance Sampling for Regularized Loss Minimization

Uniform sampling of training data has been commonly used in traditional stochastic optimization algorithms such as Proximal Stochastic Mirror Descent (prox-SMD) and Proximal Stochastic Dual Coordinate Ascent (prox-SDCA). Although uniform sampling can guarantee that the sampled stochastic quantity is an unbiased estimate of the corresponding true quantity, the resulting estimator may have a rath...

متن کامل

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Deep learning thrives with large neural networks and large datasets. However, larger networks and larger datasets result in longer training times that impede research and development progress. Distributed synchronous SGD offers a potential solution to this problem by dividing SGD minibatches over a pool of parallel workers. Yet to make this scheme efficient, the per-worker workload must be larg...

متن کامل

Improved Optimization of Finite Sums with Minibatch Stochastic Variance Reduced Proximal Iterations

We present novel minibatch stochastic optimization methods for empirical risk minimization problems, the methods efficiently leverage variance reduced first-order and sub-sampled higherorder information to accelerate the convergence speed. For quadratic objectives, we prove improved iteration complexity over state-of-the-art under reasonable assumptions. We also provide empirical evidence of th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox

نویسندگان

چکیده

منابع مشابه

Integrated Model, Batch and Domain Parallelism in Training Neural Networks

Stochastic Optimization with Importance Sampling

Stochastic Optimization with Importance Sampling for Regularized Loss Minimization

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Improved Optimization of Finite Sums with Minibatch Stochastic Variance Reduced Proximal Iterations

عنوان ژورنال:

اشتراک گذاری